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AMENDMENT 


'TKi s listing of claims will replace all prior versions, and listings, of claims in the 
application: 
Listing of Claims; 

L (Currently Amended) A method for segmenting multi-speaker speech data by 
speaker, the method comprising: 

detecting speaker changes in multi-speaker speech data to obtain an initial 
segmentation of the multi-speaker speech data, wherein-estimated, segments are generated 
by the detected, speaker changes; 

-clustering the estimated segments into, groups of estimated segments, wherein 
each group of estimated segments is associated with a single speaker; 

diking whether s egments in a first group of estimated segments overlap 
segments i n a second group of estimated- segment s, wherein if segments of the firs t group 
overlap with segments o f the second group , then the method comprises pooling the first 
and seco nd /group; and 



modeling and. resegmenting eaeh-gpoup anv pooled grou ps and remaining grows 
of estimated segments to "obtain stable segmentations-asd 
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2. (Original) A method as defined in claim 1, wherein detecting speaker changes in 
multi-speaker speech data to obtain an initial segmentation of the multirspeaker speech data 
further comprises performing a front-end analysis on the multi-speaker speech data. 


A|)p!ic8iioPv ; C , i)nin*!Kuinfc«" 10/350.727 Docket No.: 2(>02-<*K;6 

An Unit: 2662 

7. (Original)- A method as detined.in claim I f wherein clustering the estimated segments 
into groups' of segments further comprises applying an aggiomerative hierarchical clustering 
procedure to obtain an initial grouping of segments. 

8- (Original) A method as defined in claim 1, wherein clustering the estimated segments 
into groups of estimated segments further comprises clustering the estimated, segments, into 
groups of estimated, segments until all of the estimated segments are merged into a final group, 
wherein the final group includes One or more clusters that correspond to the groups of estimated 
segments and- wherein each cluster corresponds to a single speaker* 

9. (Previously Presented) A method as defined in claim 8, further comprising 
identifying the, one or more clusters in the final group empirically, 

10, (Original) A method as defined in claim 1, wherein each estimated segment is 
initially in a separate group of estimated segments, wherein clustering the estimated segments 
into groups of estimated segments further comprises: 

modeling each estimated segment by a low-order Gaussian mixture model; 

generating table of pairwise distances using the low-order Gaussian mixture 
models, wherein the table of pairwise distances includes a distance between each 
estimated segment and -every other estimated segment; and 

merging at least two groups of estimated segments to produce a new group of 
estimated segments such that a merger of the at least two groups of estimated segments 
nroduces a smallest increase in the distance. 
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.7. (Original) A method as defined.ia ciaiin I, wherein clustering the estimated segments 
into groups of segments further comprises applying an aggiomeraUve hierarchical clustering 
procedure io obtain an initial grouping of segments, 

8, (Original) A method as defined in claim 1, wherein clustering die estimated segments 
into groups of estimated segments further comprises clustering the estimated segments, into 
groups of estimated, segments until all of the estimated segments are merged into a final group, 
wherein the final group includes one or more clusters that correspond' to the groups of estimated 
segments and wherein each cluster corresponds to a single speaker. 

9; (Previously Presented) A method as defined in claim 8, further comprising 
identifying the one or more clusters in the final group empirically, 

10. (Original) A method as defined in claim 1, wherein each estimated segment is 
initial iy in a separate group of estimated segments, wherein clustering the estimated segments 
■into groups of estimated segments further comprises: 

modeling each estimated segment by a low-order Gaussian mixture model: 
generating table of pairwise distances using the low-order Gaussian mixture 

models, wherein the table of pairwise distances includes a distance between each 

estimated segment and every other estimated segment; and 

merging at least two groups of estimated segments to produce a new group of 

estimated segments such that a merger of the at feast two groups of estimated segments 

produces a smallest increase in the distance. 
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11. (Original } A method 'as defined in claim 10, further comprising merging new groups 
of estimated segments until all estimated segments are merged into a final group-. 

12. (Original) A method as defined, in claim 10, wherein modeling and resegmenting 
each group of estimated segments to obtain stable segmentations further comprises: 

contracting -a Gaussian mixture model for each group of estimated'segments; and 
calculating a Irame-by-frame likelihood ratio detection score for each Gaussian 

mixture model compared with a Gaussian mixture model representing the multi-speaker 

speech -data. 

13. (Original) A method as defined in claim wherein checking overlap between 
segments in each group of estimated segments further comprises; 

pooling the estimated segments that overlap; and 

modeling and. resegmenting the estimated segments that overlap. 

14. (Original) A method as defined in claim I, further comprising performing post- 
processing on the speaker segments by creating a segmentation lattice, wherein a best path 
through the segmentation lattice is a sequence of non-overlapping estimated segments such thai 
an overall segmentation likelihood is maximized. 

15. (Original) A method as defined in claim I, further comprising obtaining a final 
segmentation by; 

comparing detection scores of each group of estimated segments; 
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hypothesizing segment boundaries when a difference between detection scores crosses 
zero; and 

accepting segments defined by the hypothesized segment boundaries if each segment has 
a duration above a duration threshold and if each segment does not cross a silence gap that is 
longer than a gap threshold. 

16. (Original) A method as defined in claim 1, wherein the speech data is one of a 
telephone conversation between two or iriore speakers; an archived recorded broadcast news 
program; and a recorded meeting between multiple speakers. 

17. (Currently Amended) A method for segmenting speech data into speaker 
segments by speaker, the method comprising: 

scanning input speech- data with a windowed generalized likelihood ratio (GLR) function 
to obtain speech . segments, wherein the input speech data includes a plurality of speakers; 

clustering the speech segments into one or more clusters, wherein each cluster is 
associated with a single speaker; 

if more clu sters exist than s peakers, then: 

checking overlap between segments in each cluster; 

pooling clusters that h ave overlap between at least one se gment, in each pooled 

e3usten.Mld 

rcs cainenting and remqdeling the pooled clusters; 
creating models for each cluster; and 

rescanning the input speech daia with the models to.resegment die speech data and obtain 
speech segments for each speaker included in the speech data. 
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hypothesizing segment boundaries when a difference between defection scorch crosses 
zero; and 

accepting segments defined by the hypothesized segment boundaries if each segment has 
a duration above a duration threshold and if each segment does not cross a silence gap that is 
longer than a .gap threshold-. 

16. (Original) A method as defined in claim 1, wherein the speech data is one of a 
telephone conversation between two or more speakers; an archived recorded broadcast news 
program; and a recorded meeting between, multiple; speakers. 

[7. (Currently Amended) A method for segmenting speech data into speaker 
segments by speaker, the method comprising: 

scanning input speech data with a windowed generalised likelihood ratio (GLR) function 
to obtain speech . segments, wherein the input speech data includes a plurality of speakers; 

clustering the speech segments into one or more clusters, wherein each cluster is 
associated with a single speaker; 

if more clusters ex ist -t han speakers, then: 

che cking overlap between segments in each cluster; 

pooling clusters tha t have overlap between at le ast one segment in each pooled 

res egmenting and remodeling the pooled clusters: 
creating models for each cluster; and 

rescanning the input speech data with the models to resegment the speech data and obtain 
speech segments for each speaker included in the speech data. 
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3. (Original) A method as defined in claim l f wherein detecting speaker changes in 
muiti -speaker speech data to obtain an initial segmentation of the multi-speaker speech data 
further comprises at least one of: 

detecting speaker changes using Bayes Information Criterion;, and 

detecting speaker changes using a generalized likelihood ratio formulation, 

wherein a speaker change occurs when the generalized likelihood ratio formulation 

exhibits a dip. 

4. (Original) A method as defined in claim L wherein detecting speaker changes in 
multi-speaker speech data to obtain an initial segmentation of the miilti-speaker speech data 
further comprises estimating speaker segments by detecting dips in the generalized likelihood 
ratio formulation. 

5. (Original) A method as defined in claim 1, wherein detecting speaker changes in 
multi-speaker speech data to obtain an initial segmentation of the multi-speaker speech data 
further comprises estimating speaker segments when the generalized likelihood ratio formulation 
remains above a specified threshold for a particular duration. 

6. (Original) A method as defined in claim 1 ? wherein detecting speaker changes in 
multi-speaker speech data to obtain an initial segmentation of the multi-speaker speech data 
further comprises determining a location of a boundary between speaker segments by calculating 
the generalized likelihood ratio formulation over successive overlapping windows throughout the 
multi-speaker speech data. 
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7. (Original) A method as defined in claim I , wherein clustering the estimated 'segments 
into groups of segments further comprises applying an agglomerative hierarchical clustering 
procedure to obtain an initial grouping of segments. 

8- (Original) A method as defined in claim I, wherein clustering the estimated segments 
into groups of estimated segments further comprises clustering the estimated segments, into 
groups of estimated segments until all of the estimated segments are merged into a final group, 
wherein the final group includes one or more clusters- that correspond' to the groups of estimated 
segments and wherein each cluster corresponds to a single speaker. 

9. (Previously Presented) A method as defined in claim 8 ; further comprising 
identifying the one or more clusters in the final group empirically, 

10. (Original) A method as defined in claim 1^ wherein each estimated segment is 
initially in a separate group of '.estimated -segments* wherein clustering the estimated segments 
•into groups of estimated segments further comprises: 

modeling each estimated segment by a iow-order Gaussian mixture -model; 

generating table of pairwise distances using the low-order Gaussian mixture 
models, wherein- the table of. pairwise distances includes a distance between each 
estimated segment and every other estimated segment; and 

merging at least two groups of estimated segments to -produce a new group of 
estimated segments such that a merger of the at least two groups of estimated segments 
produces a smallest increase in the distance. 
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1 1 . (Original} A method 'as defined in claim 10, further comprising merging new groups 
of estimated segments until all estimated segments are merged into a final group.. 

12. (Original) A method as defined in claim 10, wherein modeling and resegmenting 
each group of estimated segments to obtain stable segmentations ftiriher Comprises: 

constructing a Gaussian mixture model for each group of estimated segments; and 
calculating a irame-by-frame likelihood ratio detection score for each Gaussian 

mixture model compared with a Gaussian mixture model representing the multi-speaker 

speech data. 

13. (Original) A method as defined in claim 1, wherein checking overlap between 
segments in each group of estimated segments further comprises: 

pooling the estimated segments that overlap; and 

modeling and. resegmenti ng the estimated segments that overlap. 

14. (Original) A method as defined in claim l t further comprising performing post- 
processing on the speaker segments by creating a segmentation lattice, wherein a best path 
through die segmentation lattice is a sequence of nonoveriapping estimated, segments such that 
an overall segmentation likelihood is maximized. 

15. (Original) A method as defined in claim L lurther comprising obtaining a final 
segmentation by; 

comparing detecti on scores of each group of estimated segments; : 
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hypothesizing segment boundaries when a difference between detection scores crosses 
zero; and 

accepting segments defined by the hypothesized segment boundaries if each segment has 
a duration above a duration threshold and if each segment does not cross a silence gap that is 
longer than a gap threshold. 

16. (Original) A method as defined in oiaim 1, wherein the speech data is one of a 
telephone conversation between two or more speakers; an archived recorded broadcast .news 
program; and a recorded meeting between multiple, speakers. 

17. (Currently Amended) A method for segmenting speech data into speaker 
segments by speaker,- the method comprising: 

scanning input speech data with a windowed generalized likelihood ratio (GLR) function 
to obtain speech .segmenis. wherein the input speech data includes a plurality of speakers; 

clustering the speech segments into one or more clusters, wherein each cluster is 
associated with a single speaker; 

if more .clusters exist th an speakers, then; 

checking overla p between segments in each cluster; 

pooling clusters that have overlap between at l east one segment, in each pooled 

res camenting and remodeling the pooled clusters: 
creating models for each cluster; and 

rescanning the input speech daia with the models to .resegment the speech data and obtain 
speech segments for each speaker included in the speech data. 
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18. (Original) A method as defined in claim 17, wherein scanning input speech data with 
a windowed GLR function to obtain speech segments further comprises performing a front-end 
analysis on the i'npui speech sample. 

19. (Original) A method as defined in claim 17, wherein scanning input speech data with 
a windowed GLiliimctkm to obtain speech segments further comprises: 

deriving a GauSsSian Mixture Mode! for each window of the windowed GLR 
function from a Gaussian Mixture Model of the input speech data; and 
adapting component weights in each window, 

20. (Original) A method as defined in claim \9 y wherein each window generates stable 
statistics and only includes one speaker segment change. 

2 1 . (Original) A method as defined in claim 1 7, wherein scanning input speech data with 
& windowed GLR function to obtain speech segments further comprises estimating speech 
segments by detecting dips in the windowed GLR function. 

'22. (Original) A method as defined in claim 17, wherein -scanning input speech data with 
a windowed GLR function to obtain speech segments further comprises estimating speech 
segments when the windowed GLR function remains above a specified threshold for -a- particular 
duration. 
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23. (Original) A method as defined in claim 17, wherein clustering the speech segments 
into one or more clusters further comprises obtaining an initial grouping of speech segments 
using an agglomeraiive hierarchical clustering procedure. 

24. (Original) A method as defined in claim 23 f wherein obtaining an initial grouping of 
speech segments using an agglomeraiive hierarchical clustering procedure further comprises: 

generating a table of painvise distances that defines a distance between each 
speech segment and every other speech segment; and 

merging estimated segments to form groups of speech segments, wherein each 
merger produces a smallest increase in distance between speech segments included in 
each group, of speech segments. 

25. - 26. (Cancelled) 

27, (Original) A method as defined in claim 17, further comprising performing post- 
processing on the speaker segments, wherein performing post-processing on the speaker 
segments further comprises creating a segmentation lattice, wherein a best path through the 
segmentation lattice is a sequence of non-overlapping speaker segments, 

28. (Original) A method as defined in claim !7, (hrther comprising comparing speaker 
segments to a set of known target speakers to detect, label and locate their presence in speech 
data.. 
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29. (Original) A method as defined in claim 17, further comprising: 
comparing detection scores of each group of estimated segments; 

hypothesizing segment boundaries when a difference between detection scores crosses 
zero; and 

accepting segments defined by the hypothesised segment boundaries if each segment has 
a duration above a duration, threshold and if each segment does not cross a silence gap that is 
longer than a gap threshold. 

30. (Original) A method as defined in claim 1 7 7 wherein the input, speech data is one of a 
telephone conversation between two or more speakers; an archived, recorded broadcast news 
program; and a recorded meeting between multiple speakers. 

3.1. (Currently Amended) A method for segmenting speech data by speaker, the 
method comprising: 

obtaining initial estimated segments of the speech data, wherein the estimated segments 
are unlabeled!; 

clustering the initial estimated segments until the initial estimated segments are grouped 
into a final group; 

selecting one or more clusters from the final group, wherein one or more clusters 
corresponds to.groups of estimated segments; 

iterative!}" modeling and resegmenting each group of estimated segments until changes in 
segment boundaries for the estimated segments in each group of estimated segments from a 
particular iteration to a next iteration are below a threshold; 
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